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ABSTRACT 

Observations of the EoR with the 21-cm hyperfine emission of neutral hydrogen (HI) 
promise to open an entirely new window onto the formation of the first stars, galaxies 
and accreting black holes. In order to characterize the weak 21-cm signal, we need to de¬ 
velop imaging techniques which can reconstruct the extended emission very precisely. Here, 
we present an inversion technique for LOFAR baselines at NCR based on a Bayesian for¬ 
malism with optimal spatial regularization, which is used to reconstruct the diffuse fore¬ 
ground map directly from the simulated visibility data. We notice the spatial regularization 
de-noises the images to a large extent, allowing one to recover the 21-cm power-spectrum 
over a considerable k± — ku space in the range of 0.03 Mpc -1 < k± < 0.19 Mpc -1 and 
0.14 Mpc -1 < fcy < 0.35 Mpc -1 without subtracting the noise power-spectrum. We find 
that, in combination with using the GMCA, a non-parametric foreground removal technique, 
we can mostly recover the spherically average power-spectrum within 2er statistical fluctua¬ 
tions for an input Gaussian random rms noise level of 60 mK in the maps after 600 hrs of 
integration over a 10 MHz bandwidth. 

Key words: method:data analysis, techniques: interferometric-radio continuum, general- 
(cosmology:) diffuse radiation 


1 INTRODUCTION 


Observations of redshifted 21 -cm radiation from the large scale dis¬ 
tribution of neutral hydrogen (HI) is one of the most promising 
probes to study the high redshift Universe. The Epoch of Reion¬ 
ization (EoR) marks an important milestone in the large scale his¬ 
tory of the Universe. The timing, duration and character of sub¬ 
sequent events which led to reionization of the Universe contain 
an enormous amount of information about the first galaxies and 


stars 


(Furlanetto et al.|2006| |Morales & Wyithe|2010 |Mellema 


af|20L)|. Evidence from quasar absorption spectra |Becker*eT'ar 


i et 


2001 1 Fan et ah 2002| and the Cosmic Microwave Background 


Radiation (CMBR) i Spergel et al.||2007| |Page et ah||2007[ |Lar-| 
|son et al~||20 1 1 1 |Planck Collaboration et al. |2015 1 together im¬ 
ply that the neutral hydrogen was reionized over a redshift range 
6^2^15, but the exact details of how the Universe was slowly 
reionized, what were the first sources of radiation in the Universe, 
how did the EoR influence structure formation ([Barkana & Loeb 


2001: [Loeb & Barkanat200l| [Pritchard & Loebl2012[ [Robertson 


et al.|2013) are presently unknown. Through observations of red¬ 
shifted 21 -cm line as a function of redshift and angular position, we 


* E-mail: ghosh@astro.rug.nl 


aim to directly measure the HI distribution in the Universe which 
will eventually shed light on the large scale structure formation dur¬ 
ing this epoch. Currently, several low-frequency instruments such 
as Low Frequency ARray (LOFArQ Murchison Widefield Array 
( MWA ] J [ P recision Array for Probing the Epoch of Reionization 
( PAPERf] Giant Metrewave Radio Telescope (GMRtQ are car¬ 
rying out observations specifically for detecting red-shifted 21-cm 
HI signal. There are also ongoing eff orts to build up interferome¬ 
ters such as MITEoR (Zheng et al.|20 14l and HER^ which use 
massive baseline redundancy to enable automated precision cali¬ 
bration, reduce system noise and cut down on computational cost 
of the correlator. 

Unfortunately, the cosmological 21-cm signal is very faint 
such that no current instrument can detect it directly. It is perceived 
that through a statistical analysis of the fluctuations in the 21-cm 
radiation it is possible to observe the high redshift 21-cm signal, 
though currently only upper limits have been placed jPaciga et al.| 


1 http://www.lofar.org/ 

2 http://www.haystack.mit.edu/arrays/MWA 

3 http://astro.berkeley.edu/ dbacker/eor/ 

4 http://www.gmrt.ncra.tifr.res.in 
° http://reionization.org/ 
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|2013||Dtilon et al.|2014||Ali et al.|2015j ). Regarding the foregrounds 
which are at least four orders of magnitude larger than the 21-cm 
signal, it is expected that individual point sources can be identi¬ 
fied in the images and subsequently be removed to a flux level de¬ 
pending on the sensitivity of the instrument. The contribution from 
the synchrotron emission {Shaver et al.| 1999ft from our Galaxy is 
another major diffuse foreground component for detecting the red- 
shifted 21-cm HI signal. Other foreground sources include free-free 
emission from ionizing halos ( |Oh & Mack|2003] l and unresolved 
extra-galactic sources which could be also large enough compared 
to the HI signal. The confusion noise generated from the sea of 
point sources below the detection limit can also be a major obstacle 
in detecting the HI signal {Di Matteo et al.|2002ft . For a dipole array 
such as PAPER, which has a wide field of view, [Parsons & Backer] 
{2009ft have developed a delay rate filtering technique which is used 
to remove the smooth foregrounds and a beam sculpting method de¬ 
pending on the fringe rate of the sources {Parsons et al.|2015ft that 
can be used to select a restricted area near the phase centre to min¬ 
imize the sidelobe contribution coming from sea of point sources 
away from the field centre. 

Radio surveys at 408 M Hz {Haslam et aI7|| 1982 , 1.42 GHz 
I Reich|[l982l |Reich & Reich|| 1988| > and 2.326 GHz Jonas et aT[ 

; 1 998ft have measured the diffuse Galactic synchrotron radiation 
at larger than ~ 1° angular scales. At relatively higher frequen¬ 
cies, |Giardino et al,| {2001ft and |Giardino et al.| {2002ft have an¬ 
alyzed the fluctuations in the Galactic synchrotron radiation us¬ 
ing the 2.3 GHz Rhodes Survey and the 2.4 GHz Parkes radio 
continuum and polarization survey. The structure of the Galactic 
synchrotron emission is not well quantified at the frequencies and 
angular scales relevant for detecting the cosmological 21-cm sig¬ 
nal. Though, the Global Sky Model (GSM) of |de Oliveira-Costa et| 
[aT7| {2008} , which is compiled from publicly available total power 
large-area radio surveys at frequencies 10, 22, 45,408 MHz and 
1.42, 2.326, 23, 33,41, 61,94 GHz gives us a representative idea 
of the diffuse Galactic radio emission at a common angular resolu¬ 
tion of 5° and with an accuracy of 1 — 10%. Recently,[Bernardi et al. 
{2009ft and |Ghosh et al.| ( [2012ft have analyzed 150 MHz WSRT and 
GMRT observations respectively where they find that the Galactic 
synchrotron emission is the most dominant foreground at angular 
scale > 10 arcminute after point source subtraction at 10 — 20 mjy 
level. A precise characterization and a detailed understanding of 
the Galactic synchrotron emission is needed to faithfully remove 
foregrounds in 21-cm experiments. The study of the Galactic syn¬ 
chrotron emission is itself interesting. This will eventually shed 
light on the cosmic ray electron distribution, the strength and struc¬ 
ture of the Galactic magnetic field, and the magnetic turbulence 
(Wael kens et al.||~2009[ |Lazarian & Pogosyan||2012[ |Iacobelli et| 
[aTpoT3| ). It is also interesting to point out that Galactic polarized 
emission can mimic the EoR signal in frequency direction and a 
proper calibration of the instrument is essential so that the leakages 
of the polarized to the total signal is minimal {Jelic et al.|2010||Geil| 
|et al.|201 H|Asad et al.|2015ft . 

However, interferometers inherently give us a dirty map, 
which is the convolution of the true map and direction dependent 
point spread functions (PSFs) or synthesized beams correspond¬ 
ing to the sampling function of the telescope. In general the PSFs 
are direction-dependent and typically not invertible {Dillon et al.| 
|2014ft . Therefore, we need to develop techniques to accurately char¬ 
acterize (and subsequently remove) the diffuse foregrounds for ev¬ 
ery image pixel in the map. The ultimate aim is to develop cali¬ 
bration and imaging software which will take into account all the 
direction dependent effects such as due to ionosphere {Vedantham] 


& Koopma ns[2014ft or the primary beam and direction independent 
(such as instrument gains) calibration errors (Yatawatta et al. 20081. 
We note that in the current generation of imaging software for LO- 
FAR the calibration solution, in the radio interferometric measure¬ 
ment equation that describes the instrument and the ionosphere, is 
only applied to the model components (mainly compact sources) in 
specific directions. In general only a single directionally indepen¬ 
dent station based solution is applied to the residual image. Hence, 
all images produced by the available imaging software will include 
some residual calibration errors which vary from pixel to pixel and 
can pose a potential obstacle for EoR detection and its characteriza¬ 
tion. Also, the polarized foregrounds leaked from Stokes Q, U to I 
maps can mimic the EoR signal if not properly calibrated. Though, 
recent results from |Asad et al.|{20 1 5ft show that in case of LOFAR a 
modest polarimetric calibration is sufficient to ensure that the polar¬ 
ization leakage remains below the EoR signal inside the Full Width 
at Half Maximum (FWHM) of the Primary Beam (PB) of LOFAR. 
The polarization leakage will also influence the diffuse foreground 
characterization and modeling for stokes I images. In this paper, 
we examine a Maximum Likelihood (ML) inversion methodology 
based on a Bayesian formalism to infer the deconvolved map di¬ 
rectly from the simulated visibilities for the Low Frequency array 
(LOFAR). We note that the technique is quite general and can be 
easily exported for other interferometers to effectively deconvolve 
the effects of instrumental point spread function. 

An important point to note here is that the fundamental con¬ 
cept of the proposed method is based on a Bayesian inference net¬ 
work where we can incorporate two levels of inference regarding 
the estimation of the foreground model. Initially, we fit a fore¬ 
ground model to the data to determine the free model parameters 
and in the second level of inference we can also rank these mod¬ 
els. This analysis technique automatically incorporate ‘Occam’s ra¬ 
zor’ which ensures that overly complex foreground models will 
not be preferred over simpler models unless the data support them 
{MacKay|| 1992ft . Regarding the choice of prior in the Bayesian 
framework, the proposed technique does not strongly depend on 
the inclusion of right priors which will influence the final outcome, 
rather many different priors can be tried (such as identity, gradi¬ 
ent or curvature regularization functions) and finally the data will 
inform us which is the most appropriate prior based on the calcu¬ 
lated evidence values. We can also maximize the evidence in light 
of the data for finding the regularization constant which will set the 
strength of the prior in the penalty function. 

Based on our simulated ML solutions, we also study how good 
we are in recovering the input EoR power spectrum if we only have 
a diffuse foreground model with unresolved extra-galactic sources, 
EoR signal and Gaussian random noise. We use a non-parametric 
foreground removal technique (Generalized Morphological Com¬ 
ponent Analysis, GMCA) to remove the foregrounds. GMCA uses 
a sparsity-based blind source separation (BSS) technique {Bobin| 
|et. al. |2008] l to describe the different foreground components which 
are finally combined in different ratios according to the frequency 
of the maps to model the foregrounds {Chapman et al.|2013ft . 

The paper is organized as follows. In Section 2, we describe 
our approach in mathematical formalism. Section 3 discusses the 
templates used for generating the visibility data corresponding to 
the EoR signal and diffuse foreground emission. In Section 4, we 
present our inversion results, while in Section 5 and 6 we describe 
our foreground removal results and power spectrum comparison. 
Finally, in Section 7 we present a summary and possible future ex¬ 
tension of the current work. 
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2 BAYESIAN INFERENCE : FORMALISM 

The fundamental concept of our proposed method is based on a 
Bayesian framework ( jMacKay|1992| l which automatically ensures 
that overly complex models will not be preferred over simpler mod¬ 
els unless the data support them. The Bayesian framework adopted 
in this work automatically incorporates ‘Occam's razor'. In this 
section, we give a general overview of the mathematical formal¬ 
ism which is used to develop the ML technique. Let us consider a 
linear system, 


y=Fs+n (1) 

where V is a vector of data points, F represents the response func¬ 
tion, s are the source flux parameters that we want to infer given 
the data and n are the noise in the data characterized by the co- 
variance matrix Cd (see e.g. Suyu et al. {2006) ). In our case, V 
are the measured visibilities V u (u, v, w), F is the Fourier trans- 
form kernel e - 2 ^i+vm+ w (fi-i^-^-i)) ands are the intensi . 
ties x in the sky that we want to infer (dVl p is the 

pixel area in solid angle, for details we refer the reader to section 
|3.4) . Assuming the noise to be Gaussian, the probability of the data 
given the model parameters s is 


p(y|s) = exp( " gD(y| *)) , 

Z D 

where 

En(V\s) = ^(Fs-yfC^Fs-E) 


and Zo = (27r) JVd ^ 2 (det Cd) 1 ^ 2 is the normalization factor. 

By definition, the most likely solution (sml) maximizes the 
likelihood or minimizes Pd. By setting, VBd(sml) = 0, where 
(V = ^)we find the maximum likelihood (ML) solution being. 


SML = (F h C ^FT^C^V (4) 

In many cases, the problem of finding the most likely solution 
that minimizes the half of the f 2 (here, Pd) is ill-posed and we 
need to introduce priors to regularize the solution of s. 

The posterior probability of the source flux pixels s, given the 
visibility data V, a fixed form of regularization function R and a 
level A of regularization follows from Bayes’ theorem, 


P{s |V,A,R) 


P(V\ S )P(s\R,X) 
P(V\X,R) ’ 


(5) 


where P(y|A, R) is called the evidence which depends on the 
model parameters {A, R}. We choose the model that maximizes 
the evidence. In the first level of inference, the most probable (MP) 
solution maximizes the posterior, which can be written as, 


P{s |V,A,R) 


exp (—M(s)) 
Z M (X) 


( 6 ) 


where M(s) = Pd(s) + APs(s), X is the regularization con¬ 
stant, Eg is called the regularizing function and Z m(A) = 
f d Ns s exp(— M(s)) is the normalizing evidence function. 

In our analysis we choose three most common quadratic forms 


of regularization functions (e.g., zero order, gradient and curva¬ 
ture). We subsequently maximize the evidence to fix the optimal 
regularization function and regularization parameter. This allows 
for a fair comparison between any form of regularization. Although 
we don’t claim to have found the “best” form, we note that for 
reconstructing smooth diffuse foregrounds or the 21-cm signal, 
higher order regularization functions (i.e. gradient or curvature) in 
general work better compared than the zeroth order regularization 
{Suyu et al~1{20()6) ; see also |Koo pmans (20051 where these func¬ 
tion were also used for lensing purposes). Any other form that can 
be conceived can in that case objectively be compared to the ones 
that we chose to test, which are the simplest forms possible. In in¬ 
dex and summation notation the zero order regularization can be 
expressed as 


1 s 

Es(s) = -Jfsl (7) 

i=1 

which tries to minimize the flux at every source pixel. 

For gradient and curvature forms of regularization let s; 1: i 2 
be the source flux at pixel (ii,* 2 ), where i\ and *2 range from 
i\ = 1,..., (Vis and *2 = 1,..., N^ a , in the two dimensions, 
respectively. N s = N\ S N 2 S is the total number of flux pixels. We 
use the following form of gradient regularization. 


^ iVis-l N 2s 

e s (s) = — y' y' [ s *i,i2 _ Sii+i,^] 

*1=1 * 2 =1 
^ N ls N 2s -1 

+ 2 ^ y y "j \ Si ld? ~ S il,*2 + l] (8) 

* 1=1 * 2=1 

which tries to minimize the difference in the flux values between 
adjacent pixels on both dimensions. 

Similarly, the curvature regularization function that we use is 


^ iVi a -2 N 2b 

Es(s) = — y ] y ] [sii,i 2 _ 2si 1 +i,i 2 +sn+2,12] 

*1=1 *2=1 

^ tVls N 2s -2 

+2 y , y ] [ s *i,*2 2si 1 ,* 2 +i+ 5 * 1 ,* 2 + 2 ] ( 9 ) 

*1 = 1 *2=1 


which minimizes the sum of the curvature in both dimensions. 

The most probable solution can be derived from the most 
likely solution. For mathematical convenience, we introduce some 
parameters B and C which are the Hessian of Pd and Ps re¬ 
spectively (B = VVPd(s) and C = VVPs(s)). We define 
the Hessian of M as A = VVM(s), which is equivalent to 
A = B + AC. Following Suyu et al. (2006), VPd(jml) = 0 
gives «ml = B _1 F h Cq 1 E and the most probable (MP) solu¬ 
tion can be estimated by VM(j mp) = 0, which can be written 
as smp = A -1 Bsml- We note that the MP solution depends on the 
regularization constant A since the Hessian A depends on A. 

To find the value of the optimal regularization parameter we 


maximize, 


P( A|V, R) 


P(V|A,R)P(A) 

p(y|R) 


( 10 ) 


Assuming a flat prior in log A, we maximize the evidence in 
order to optimize A. 
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For any quadratic form of the regularization function Es{s ) 
we can write the evidence as. 


P(V |A,R) 

where, 


Z M {\) 

ZnZs(\) ’ 


(ID 


Z m (X) = (27r) JVs/2 (detA)" 1/2 e“ M(SMp) , (13) 

We optimize the log of evidence to find the optimal value of 
regularization constant A. Using eqn.[T2]and[T3]the log of evidence 
can be expressed as. 


N s /2 


(detC)" 1/2 e" ABs(0) , 


( 12 ) 


logP(V|A,R) = — A-Es(smp) — £d(smp) 

- ^ log(det A) + ^ log A + XE s (0) 

+ 7} log(detC) - log(27r) 

+ i ^(detCn 1 ). (14) 


Here, IVd is the number of visibility samples. 

Assuming a flat prior in log A, we maximize the evidence 
P(V|A, R) to find the optimal reg ularizatio n constant A. Solving 
dl ^ gA log P(V|A, R) = 0 (Suyu et ah 2006), we get the following 
non-linear equation for A, 

2APs(smp) = N s — ATr(A _1 C), (15) 


where, N s are the number of sky flux pixels. We note that solving 
this non-linear equation is computationally intensive for inversions 
on a large grid and we find that A = 1/cr 2 , where a 2 is the noise 
variance in the data, is a good choice for setting the value of reg¬ 
ularization constant. Following Suyu et al.| (2006) , we expect M 
evaluated at the optimal A value to be equal to half of the num¬ 
ber of data points (M(smp) ~ Nd/2). We find that A = 1 /<t 2 
follows this quite well and the difference is close to the 1% level. 
We emphasize here that in our analysis we solve eqn. [15] explic¬ 
itly to find the optimal A value with different regularization func¬ 
tions namely, identity, gradient or curvature regularization. We also 
note that regarding the choice of the functional form of prior in this 
framework, our technique does not heavily depend on the inclusion 
of very precise priors (indeed a genuine worry in many Bayesian 
methods, but one that is mitigated in the framework laid out by 
|MacKay| | |1992| which we use here) which will influence the fi¬ 
nal outcome. Rather, different prior families can quantitatively be 
compared (e.g. identity, gradient or curvature regularization func¬ 
tions) and the data inform us which is the most appropriate prior 
(both shape and strength) based on the calculated evidence (i.e. 
the marginalized posterior). For ranking different models the reg¬ 
ularization factor A is a nuisance parameter which ends up being 
marginalized. In general the distribution of A is sharply peaked and 
we can approximate P(A|V, R) by a delta function peaked at its 
most probable value A. Hence, the evidence can be approximated 
by P(U|A, R). In our case, using the non-gridded visibilities we 
find that the gradient regularization works best with peak values 
of log evidence at the optimal A value of —166043, —163919, 
— 164109, respectively, for identity, gradient and curvature regu¬ 
larization functions (Figure^. Based on the Bayes factors, there is 


a very strong evidence towards choosing the gradient regularization 
over the identity or the curvature forms of regularization <|Jeffreys| 
|l961|[Kass and Raftery|20lT| . For the rest of our analysis we there¬ 
fore choose the gradient regularization function. 

In Figure [2] we show the ML maps corresponding to different 
regularization functions. Even though the ML maps look similar, 
based on the evidence values, we decided to use curvature regular¬ 
ization for rest of our analysis. We note that Occam’s razor is im¬ 
plicit in our evidence optimization. If the model parameter space is 
overly-large (for small values of A), Occam’s razor penalizes such 
an overly-flexible model. On the other extreme, for overly-large 
values of A, the model can no longer fit to the data as the model pa¬ 
rameter space is restricted to a limited region. The optimal A value 
lies somewhere in between these two extremes which ensures that 
overly complex models will not be preferred over simpler models 
unless the data support them. Although, it is in principle possible 
to infer a as well, in this paper we assume that we can infer it inde¬ 
pendently. 


3 SIMULATED DATA TEMPLATES 

In this section we describe briefly how the data is generated for 
the ML analysis. We consider templates for the EoR signal and 
diffuse foreground emission from which the visibility samples were 
generated and then we added Gaussian random noise to the real and 
imaginary part of these visibilities. 


3.1 EoR Signal 


We have used the semi-analytic code 21cmFAST (jMesinger &| 
|Furlanetto|2007[|Mesinger et al,|201 l| l to simulate the EoR signal. 
Being a semi-analytical code 21cmFAST treats physical processes 
with approximate methods, but apart from the smaller scales 
(< IMpc) the results tend to agree well with the hydro-dynamical 
simulations of Mesinger et al7|( j201 1| . We note that the EoR signal 
simulation used here is the same as in |Chapman et al,| ( |2012| >. 
We run the code using the standard parameters from seven year 
Wilkinson Microwave Anisotropy Probe (WMAP) Observations, 
( »a n ro , n b , n, erg, fe)=(0.72,0.28,0.046,0.96,0.82,73)( |Komatsu 
et a l |201 1 1 . We initialized the simulation with 1800 3 dark matter 


particles at z = 300. The code uses linear perturbation theory to 
evolve the initial density and velocity fields to the redshifts of the 
EoR. The velocity field used to perturb the initial conditions were 
formed on a grid of 450 3 and then interpolated to a finer grid of 
512 3 . 21cmFAST utilizes an excursion set formalism to find dark 
matter halos. For our simulation, we set 1O 9 M 0 to be the threshold 
for halos contributing ionizing photons. Once the evolved density, 
velocity and ionization fields are obtained, the code computes 
the brightness temperature 5Tb box at each redshift based on the 
following equation, 


STh = 28 mK x(i + 5 )l „(i-|=)(AlL) 

imW) 

where the brightness temperature fluctuation 5Tb is detected as a 
difference from the background CMB temperature Tomb l jField| 
|1958| |1959[ |Ciardi & Madau||2003| l, h is the Hubble constant in 
units of 100 kms -1 Mpc - , xhi is the neutral hydrogen fraction 
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-6 -4-2 0 2 -6 -4 -2 0 2 -6 -4 -2 0 2 

log(X) log(X) log(X) 

Figure 1. From left to right this figure shows the log evidence as a function of log of the regularization parameter A for identity, gradient and curvature 
regularization function respectively. The log evidence is maximum for gradient regularization and the vertical line in the 2nd panel displays the peak of the log 
evidence which is around log(A) = 0.0582, or equivalently A = 1.06. 


ML Map 

Input Map Identity 



-5 -4 -3 -2 -1 0 1 2 3 4 5 

Figure 2. The top left panel shows the input map without any noise. The remaining panels show the ML inverted maps corresponding to identity, gradient and 
curvature regularization functions respectively. Each panel covers a region of 2° x 2°. 
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and and are the baryon and matter densities in critical den¬ 
sity units. For simplicity, we ignore the gradient of the peculiar ve¬ 
locity whose contribution to the brightness temperature is relatively 
small ( jGhara et al.||2014[ [Shimabukuro et al.|2014fr . We also ne¬ 
glected spin temperature fluctuations by assuming T ap i n 3> Tomb, 
i.e. the neutral gas has been heated well above the CMB tempera¬ 
ture during EoR epoch i j Pritchard & Loeb|2008[ >. 


3.2 Diffuse foregrounds 

The diffuse foreground model used in this paper is described in 
great detail in |Jelic et~a l. (2008, 2010). These simulations include 
Galactic Diffuse Synchrotron Emission (GDSE), Galactic local¬ 
ized synchrotron emission, Galactic diffuse free-free emission and 
unresolved extra-galactic foregrounds. The Galactic Diffuse Syn¬ 
chrotron Emission (GDSE) is produced by cosmic ray electrons in 
the magnetic field of the Galaxy ( [Ginzburg & Syrovatskii|1969j l. 
The intensity and the spectral index of the GDSE are modeled as 
Gaussian random fields. The Gaussian fields are generated assum¬ 
ing a power law spatial power spectrum with 2D index of —2.7. 
The GDSE is modeled as a power law as a function of frequency 
with a spectral index of —2.55 ± 0.1 ( [Shaver et al.| 1999} . The am¬ 
plitude of the brightness temperature is set around to 253 K at 120 
MHz. 

Another part of the synchrotron emission comes from the 
galactic localized supernovae remnants (SNRs). Along with the 
GDSE SNRs make up 70 % of the total foreground emission af¬ 
ter point sources are removed. Inside the observational window of 
10° x 10° eight SNRs were randomly placed. We note that due 
to large computational requirement for our current ML inversion 
we have restricted our FoV to 4° x 4° which corresponds to the 
FWHM of the LOFAR primary beam. The SNRs are modeled to 
be extended disks and their angular size, flux density and spectral 
index are randomly chosen from the observed SNR catalog of D.A. 
Green (A Catalogue of Galactic Supernova Remnants, 2006 April 
version^. 

The free-free emission is generated through the thermal 
bremsstrahlung radiation from our galaxy and external galaxies 
from the diffused ionized gas. It is also modeled as a Gaussian 
random filed similar to GDSE, but with a different frequency spec¬ 
tral index fixed to -2.15 ( |Tegmark et al.|2000}|Santos et al.|2005| l. 
Compared to other foreground components, the free free emission 
only contributes 1% to the total foreground emission (Shaver e t al.| 
|1999| , but orders of magnitude higher than the 21-cm signal. 

Unresolved extra-galactic sources, mainly radio galaxies and 
clusters, contribute almost ~27 % of the total foreground emission 
after discrete point sources are removed from the map ( [Jelic et ak| 
|2008[ >. The simulated radio galaxies have a power law spectrum and 
are clustered based on a random walk algorithm. The radio clusters 
are chosen from the cluster catalogue of Virgo Consortiurr]^] and 
using the corresponding relation between the observed mass lumi¬ 
nosity and X-ray radio luminosity functions. 

Figure [3] shows one of the slices at 165 MHz for the diffuse 
foreground and the EoR signal that we use in this analysis. 


6 https://www.mrao.cam.ac.uk/surveys/snrs/ 

' http://www.mpa-garching.mpg.de/galform/virgo/hubble/ 


> 



Figure 4. The figure shows the uv coverage of our LOFAR simulation. 
We do not show the w component here, but in our analysis we take the w 
component into account. The uv points are generated within 30 to 250 uv 
cut. The uv points and the hermitian uv values are shown in this figure. 
Here ( u , v) are the antenna separations in wavelength units at a frequency 
of 165 MHz. 


3.3 Visibility data 

We use the LOFAR-HBA antenna positions (|van Haarlem et al. 


2013) to generate the baseline components (u,v,w) towards the 


North Celestial Pole (NCP) which were used to generate the vis¬ 
ibility data for our simulation. Figure [4] shows the uvw coverage 
that we have used for our simulation. To ensure a manageable data¬ 
set, we have sampled visibilities for each 100 seconds interval for 
a uv track covering the full ± 6 hour angle. The components u and 
v specify the baseline coordinates projected on the plane tangent 
to the NCP, while the third component w of the coordinate frame 
points towards the direction of the NCP. We note that the actual 
sampling of the uvw points in LOFAR observation will be higher, 
but here we restrict our number of visibility samples as we are lim¬ 
ited by the present computational resources. Note that the number 
of visibilities is chosen about equal to the number of independent 
uvw cells, hence after gridding the computational effort remains 
similar even if the number of visibilities goes up. 

The maximum baseline length is restricted to 250 A which cor¬ 
responds to a maximum wavelength on the sky of ~ 13 arcminute. 
In our inversion, we choose a pixel scale of ~ 3 arcminute which 
ensures that we sample the PSF reasonably well. 


3.4 Fourier Kernel Matrix 

A typical radio interferometric array simultaneously measures visi¬ 
bilities at a large number of baselines and frequency channels. Ide¬ 
ally, each visibility records a single mode of the Fourier transform 
of the specific intensity distribution /„ (£, m) on the sky. Represent¬ 
ing the celestial sphere by a unit sphere, the component n can be 
expressed in terms of (l, m) by n(l, m) = v/1 — Z 2 — m 2 . Then 
the measured visibility for monochromatic, unpolarized signal is 
related with the specific intensity distribution through the follow¬ 
ing relation ([Perley et al.| 1989}, 


V v (u,v,w) = j 


I*(lp 


y/l — l 2 — m 2 

xe -2ni(ul+vm+ W (^l-l2-m2-l)) dldm ( j ?) 
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Figure 3. This figure shows a slice at 165 MHz for the diffuse foreground emission and the EoR signal at pixel resolution of 3 arc-min. Note that the mean is 
subtracted from the maps. Each panel covers a region of 2° X 2°. 



We note that without loss of generality (assuming that all sta¬ 
tions have identical beams) we have not incorporated any primary 
beam pattern in the visibility definition which in general can be ab¬ 
sorbed into the specific intensity distribution. To test this, we have 
introduced a Gaussian Primary Beam (PB) in the sky with a FWHM 
of 2 degree within the field of view of 4 by 4 degree and subse¬ 
quently created the input visibility data. During the inversion we 
absorb the PB in the response matrix (as a directionally-dependent 
amplitude to the Fourier kernel) to obtain the corresponding “PB- 
deconvolved” true sky. We compare these solutions with the solu¬ 
tions presented in this paper where we have not introduced any PB 
in our analysis and we find that these two sets of solutions are very 
similar (Figure |5J. We note that here we assume our inversion grid 
is in (, m coordinates and the sky phase centre corresponds to the 
centre of the grid. Each visibility corresponding to a uvw value 
has contribution from all the pixels in the field of view of 4° with a 
pixel resolution of 3 arcmin. To relate it with our matrix notation (in 
Section|2j, we recall that here V is our data matrix V, the Fourier 
kernel, e -2"(“H™+»(v , i-‘ 2 -“ 2 -i)), is the response matrix F 
and x dVl p is the model sky parameters that we want 

to infer from the visibility data. 

In our simulation, we generate visibilities at exactly the same 
(u, v, w) coordinates for all the frequency channels. This ensures 
that for a limited sampling of the uvw points the PSF will not 
vary with frequency which in principle can be major obstacle for 
foreground removal (Bowman et al.|[2009| l. We note that in case 
of a real LOFAR observations, where the uvw sampling is quite 
dense, ensures that the PSF variation with frequency can be made 
very small with proper weighting and gridding. LOFAR observa¬ 
tions of the NCP show that this can be done to a level of better 
than 30 dB (i.e. < 10 -3 ). On scales of a few arcminute, this leads 
to < 1 mK chromatic effects from any remaining compact source 
of a few mJy. For extended emission that we are simulating this 
will be even less. Therefore, in this simulation paper we choose 
not to scale the uvw values with frequency such that it does not 
introduce any additional effects due to our (currently) limited uvw 
samples. We also did a test where we have generated another sets 
of uvw samples by scaling the uvw values (Uo) with frequency 
as U n = Uo(l + nAv/vo) (The number of frequency channels 
n = 20, channel width Av = 0.5 MHz, the middle of the fre¬ 
quency band vq = 170 MHz). Then we only use those visibil¬ 
ities within a maximum uv cut of 250 and redo the ML analy¬ 
sis. We find the resultant ML solutions are quite similar compared 


to uvw values (those with Uo) with no frequency scaling (Figure 
[6)>. Moreover, proper fil tering of sidelobes via delay-rate filtering 
( Vedantham et al.|20 1 2) can mitigate remaining sidelobe leakage. 
We therefore believe our current approximation is justified. 

We simulated visibilities over 20 frequency channels of 0.5 
MHz resolution covering a frequency range of 165 to 175 MHz, 
where the EoR signal peaks in the simulation (Mesinger & Furlam] 
etto 2007, Mesinger et al. 2011). The restricted band-width ensures 
that the evolution of the 21-cm signal along the line of sight direc¬ 
tion due to the light cone effect (Datta et al.||2012} will not be a 
major concern. 

In addition to the signal component each visibility also has a 
random noise contribution. We added Gaussian noise of rms 60 mK 
in the real and imaginary parts of the sampled visibilities to mimic 
actual LOFAR observations. This is close to the expected rms level 
of noise after 600 hours of LOFAR observation (Chapman et al.| 
|2014[ >. We note that this noise value is indicative only and it may 
change somewhat depending upon the observations. 


4 ML INVERSION 

In this section we present our ML solutions based on the visibil¬ 
ity data that we generated (Section [33). The computational effort 
of these linear inversion problem is in general of order ~ JV 3 . For 
N a = 80 x 80 pixels, this leads to 10 11-12 floating-point opera¬ 
tions per frequency channel. To overcome this, we run this inver¬ 
sion on a parallel 2TB memory machine. Also creating the matrices 
from the data itself takes a similar effort. 

However, if we have a good estimate of an initial solution 
(which we can get in principle via FFTs from the dirty images for 
different channels within the frequency bands) then the inversion 
reduced to a ~ W 2 , where N s is the number of sky pixels, where 
the stands for 4-5 iterations in a conjugate gradient (or quasi- 
Newtonian) optimization scheme. Here, because each frequency 
slice looks similar to the adjacent one, we use the ML solution of 
the previous slice as starting point for the subsequent slice. In this 
paper we have done our analysis with both non-gridded and grided 
data to assess whether the solutions differ substantially, when cre¬ 
ating the visibility data matrix, the response matrix (the Fourier 
kernel which is a filled matrix) and the noise covariance matrix. 
For large number of visibility samples this is quite time consuming 
and requires a lot of computer memory. Therefore, we restrict our 
visibility data to a size that we expect in real gridded data (about 
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Figure 5. From left to right, the 2° x 2° panels respectively show the true input map without any noise, the inverted map without the any effect of the PB, the 
PB-deconvolved map (where the PB was applied during creating the visibility data and the beam was included in the Fourier transform kernel to find out the 
corresponding beam deconvolved ML solutions) and the difference between the panel two and three. We see that the difference map is very close to zero apart 
from the corner of the field where the PB drops significantly from the centre of the field and the noise increases considerably which may have some effect on 
the inversion. 


ML Map Difference Map 
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Figure 6. The top left panel shows the input map without any noise. The rest of the panels respectively show the ML inverted map (w/o PB) where the 
uv values ( Uq ) correspond to the start of the frequency band, the 3rd panel show the ML solutions where the uv values (U n ) are scaled according to 
U n = f/o (1 4- nAi'/t'o) and then we have selected a subsection of visibilities from these scaled uv values which lie within a Umax=250 to reconstruct the 
ML solutions. The last panel show the difference between panel 2 and 3 which is mostly close to 0. Each panel covers a region of 2° X 2°. 


128 2 gridded visibilities) as well as choose to restrict our field of 
view to 4 by 4 degree with a pixel resolution of 3 arc-minute where 
(3 arc-minute and above due to the increase of noise below this 
scale) we will be mostly sensitive to detect the 21-cm signal in 
LOFAR. Overall we think these choices are fairly reasonable and 
comparable to grid sampling and fields of view that are used in real 
observations. 


4.1 Resulting inversions 

Figure [7] shows the ML solutions based on the visibility data sets 
that we use. The top most left panel shows the input map, where 
the mean is subtracted from the input template maps as interferom¬ 
eters such as LOFAR is only sensitive to fluctuations in the sky. 
Subsequent maps are the results of the ML inversion with non- 
gridded and gridded visibility data with grid pixel-size of 1, 2, 4 
and 8 lambda respectively. For each of these, we re-optimize the 
regularization parameter to maximize the Bayesian evidence. We 
note that finer binning is always useful since it provides a better 
approximation of the visibility positions in the uvw cube. We av¬ 
eraged the visibilities according to their w components where the 


number of w slices is set by N w = XB/D 2 , where A is the ob¬ 
serving wavelength, B is the maximum baseline length and D is 
the station diameter (Perley et al.|1989fr . The pixel resolution of the 
ML simulation is set to 3 arcminute scale sufficient to recover the 
EoR signal {Mesinger & Furlanetto|2007] l. This also ensures that 
we sub-sample the PSF reasonably well (more than 4 times). We 
note that the ML inversion maps look smoother relative to the input 
map because we do not sample all the small angular scales with 
our limited uvw. Flence, a direct comparison in the image domain 
remains difficult because of this ‘filtering’. 


4.2 Data and Model Coherency and Foreground 
Power-Spectrum 

Next, we computed the visibility coherency matrix between the in¬ 
put data and reconstructed ML solutions. We define the coherency 
matrix as. 


Vcoh 


\{v a ,ME \ 2 

<|14a ta | 2 )<|C M L| 2 } 


(18) 
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Figure 7. The top left panel shows the input map without any noise. The remaining panels show the ML inverted maps for non-gridded and gridded visibility 
data with grid size of A« = 1,2,4, and 8 lambda respectively. For the ML inverted maps we have added 60 mK of noise to the real and imaginary part of the 
visibility data, as expected for LOFAR after 600 hrs of integration. Each panel covers a region of 2° X 2°. 


and evaluate this in bins of baselines and across frequency chan¬ 
nels. For a perfect reconstruction, we expect V co h to be 1 , although 
the effect of noise will lower this value a little (although the noise 
is small compare to the foreground flux). The results are shown in 
Fig. 0 . We notice that apart from A u = 8 , the coherency matrix 
is mostly within 0.99-1.0 and this suggests that the inversion works 
well when the data is not averaged heavily. 

We also computed the power spectrum (PS) of the input map 
and the ML inverted maps corresponding to the gridded and non- 
gridded viability data . We interpolated the PS to common baseline 
values so that we can compare different maps. The resulting PS 


are shown in Figure [9] We find that due to the averaging of the 
visibilities with different grid sizes there is a overall drop in the 
power spectrum on all scales, though the effect is more pronounced 
on smaller scales. 

In Table [I] we tabulated the reduced x ' 2 values for the non- 
gridded and gridded visibility data sets. We note that here we do 
not compute the evidence values as the visibility data changes each 
time for different grid sizes (Au). We define the reduced \ 2 us¬ 
ing the number of degrees of freedom as TVb — 7 , where Nh is the 
number of visibility data (non-gridded or gridded) that is used in the 
inversion and 7 is the right-hand side of equation which deter- 
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Figure 8. The visibility coherency plot between the input data and the corresponding ML non-gridded and gridded visibility data. 
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Figure 9. The spatial power spectrum corresponding to the input and ML 
maps as function of baseline length. Each panel shows a region of 2° X 2°. 


mines the number of “good" parameters determined by the visibil¬ 
ity data. The reduced \ 2 commonly determines the goodness of the 
fit and its value is close to 1. We note that reduced x 2 is not strictly 
1.0 because Bayesian analysis ensures that evidence is maximized 
instead of trying to reach a reduced x 2 of 1.0. We find that for the 
non-gridded visibility data the reduced x 2 is very close to unity. 

Up to Au = 2, the reduced x' 2 value remains near one after 
which it increases. Based on this we decided to carry on our subse¬ 


quent analysis with the visibility data which has been gridded with 
Au = 2, as a good compromise between computational speed in 
the ML inversion and evidence maximization and goodness of fit. 
We note that ML inversion maps contain still the FG and EoR sig¬ 
nal and that the noise in this process is largely filtered. To what 
level this has succeeded will be tested in the subsequent section. 


5 FOREGROUND REMOVAL 

Low frequency radio observations suggest that the foregrounds are 
several orders of magnitude larger than the redshifted 21-cm signal 
( |Bemardi et al.|2009||20ld||Ghosh et al.|20lT||2012] l. Therefore, 
removing the foregrounds is possibly the biggest challenge for de¬ 
tecting the faint cosmological 21-cm signal. The foregrounds are 
expected to have smooth spectra, while the 21-cm signal is line 
emission which varies rapidly with frequency (although possibly 
less rapidly spatially). This property of the HI signal holds the 
promise of allowing us to separate the signal from the foregrounds. 


A possible line of approach is to represent the sky signal as 
an image cube and for each angular position use polynomial fitting 
to subtract out the smooth component of the sky signal that varies 


slowly with frequency (Santos et al. 20051 Wang et al. 2006 [ Me- 

Quinn et al.|2006 [Jelic et al. 2008, 

Liu et al. 2009, Liu, Tegmark 

& Zaldarriaga]20091 Petrovic & Oh 

2011). The residual sky signal 


is expected to contain only the 21-cm signal and noise. We note 
that all these parametric foreground fitting techniques heavily de¬ 
pend on the prior knowledge of the foreground structure which is 
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Figure 10. A slice at 165 MHz of the input foreground, GMCA reconstructed foreground model and the residual after the foregrounds are subtracted. Each 
panel covers a region of 2° x 2°. 


Table 1. The reduced x 2 values evaluated at the regularization constant (A) used in the simulation for non-grid and gridded visibility data for different grid 
sizes (A u). The reduced x 2 is defined as 2 Ed / (N^ — 7 ), where 7 = AT X — ATr(A _1 C) is the right-hand side of eqn. 


Gradient 

Regularization 

Non-Grid 

Grid (A u = 1) 

Grid (Ait = 2) 

Grid (Ait = 4) 

Grid (Ait = 8 ) 

x 2 

1.046 

1.07 

1.14 

1.83 

53.21 


largely unknown at the spatial resolution of the low frequencies 
of interest. |Liu et al7| {2009} also showed that this method of fore¬ 
ground removal has problems which could be particularly severe 
on large baselines if the uv sampling is sparse. On the other hand, 
non-parametric methods do not assume any specific form of the 
foregrounds and use the data to decide on the foreground model 
( [Harker et al.|2009||Chapman et al.|20 1 2[|20 1 3[ . Here, we use the 
Generalized Morphological Component Analysis (GMCA) which 
is a blind source separation technique (BSS) which assumes that a 
wavelet basis exists in which spectrally smooth foreground com¬ 
ponents are sparsely represented with minimal basis coefficients 
{Chapman et al.|2013| that can be separated from the 21-cm signal. 

GMCA utilizes a component separation technique to define 
the foregrounds in order to subtract them, treating the 21-cm signal 
as noise. We model the foregrounds using four components which 
refers to the number of foreground contributions that can be de¬ 
scribed by unique sparse descriptions. We note that as more com¬ 
ponents are added to the model we might expect the 21-cm signal 
itself to leak into the foreground model, although 4 to 5 components 
seem to work well in recovering the 21-cm signal {Chapman et al.| 
|2013| . In figure [TO] we show one of the slice of the reconstructed 
foreground model using GMCA and the residual solution. We find 
an average correlation of > 70% with the true input foreground 
map and the GMCA reconstructed foreground model across the 10 
MHz bandwidth. 


6 DENOISING AND POWER-SPECTRA 

In an ideal situation one would like that after ML inversion the 
residual visibilities will be mostly dominated by noise and the ML 
solutions will be noise free. We find that we recover most of the 
input noises in the residual visibilities and the difference is around 
0.1% compared to the input noise for the non-gridded case. We 
note this is an idealistic situation where we assume that we know 
the noise level in the data and we build up the noise covariance 
matrix using the same noise values during the ML inversion. The 
GMCA residual consists for a large part of the EoR signal and we 
find a clear correlation 50%) between the input EoR signal that 


we use in our simulation and the residual GMCA solutions (Fig¬ 
ure CD- The mean correlation between different slices does not 
change significantly if we smooth the solution with Gaussian fil¬ 
ter of different widths (Figure [TT}, suggesting that the de-noising 
seems scale independent. It is interesting to highlight that if we 
use a dirty map as an input template to GMCA the residual map 
will have contribution from the system noise, the 21-cm signal and 
possibly some foreground residuals. Moreover, the success of the 
signal recovery requires the noise statistics at different frequency 
channels to be determined at an extremely high level of precision. 
On the other hand, we find the ML solutions are de-noised heavily 
and the residual solutions after GMCA are mostly dominated by 
EoR signal. Though, during real observations, the EoR signal de¬ 
tection can be more complicated where any unmodeled foreground 
power and any uncertainties in the bandpass response can prohibit 
such high level of precision to be achieved. 

6.1 Power spectrum: Recovered and Input EoR 

This section describes our comparison between the input EoR and 
the recovered EoR power spectrum after the ML maps are passed 
through GMCA. 

6.1.1 2D Power spectrum 

To test the power spectrum recovery, we compute the cylindrically 
binned 2D power spectra from the GMCA residual maps and com¬ 
pare it with the input EoR power spectrum. We find that the power 
spectrum is still dominated by system noise at higher k± where 
the baseline density drops. In addition, GMCA removes large scale 
modes along the frequency direction and this severely suppresses 
the power-spectrum on the low fey scales. Although, this can be 
mitigated by running GMCA on wider band-width cubes. In our 
comparison, we do not include these high k± and low fcy modes. 
Figure [T3] shows the cylindrically binned 2D power spectra ver¬ 
sus the input EoR signal. Note that, although we did not subtract 
any noise power spectrum from the recovered power spectrum, we 
recover the input EoR power spectrum quite well. This suggests 
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Figure 11. The left panel shows the input eor maps and the right panel 
shows the GMCA residual maps. In the top row no smoothing has been 
applied and in the subsequent rows we applied smoothing with sigma = 0.5, 
1 and 2 respectively for a Gaussian filter with size [5 5] pixels. Each panel 
covers a region of 2° X 2°. 






Figure 12. The correlation co-efficient between the GMCA residual and 
input EoR as a function of frequency. The black horizontal line shows the 
mean of the correlation. We choose the Gaussian filter with size [5 5] pix¬ 
els and different sigma values. Clockwise from the top left panel the fig¬ 
ures show the correlation coefficient corresponding to no smoothing and 
smoothing with sigma = 0.5, 1, 2 respectively. 


that the ML solutions are indeed de-noised. Based on the k± scales 
where we have a reasonable match with the input EoR power spec¬ 
trum and using k± = ^|r where r corresponds to the comoving 
distance at a redshift of 7.35, we find that we are most sensitive 
in recovering the EoR signal at an angular scales of ~ 6 to 17 ar- 
cminute. 

To inspect the PS recovery in the k± and k\\ plane, we cal¬ 
culated the ratio of the input and the recovered EoR PS. Ideally, 


Input EoR 



Recovered EoR 



loglO 

'i* I I I I 

E -3 -2.8 -2.6 -2.4 -2.2 -2 -1.8 

Figure 13. This figure shows the cylindrically binned 2D PS for the input 
and the recovered EoR. 


we expect the ratio of the PS to be close to unity. In Figure [l4| we 
find that in most of the k± — fc plane the ratio plot is close to 
unity. We note in the lower most fty bin where LOFAR is sensitive 
enough to measure the PS, we find that GMCA removes most of the 
large scale smooth frequency mode and the recovered EoR signal 
on these scales is heavily suppressed. We have therefore avoided 
these low modes in Figure[l4] We find in the range of k scales 
shown in Figure[l4]the PS ratio varies from ~ 0.9 — 1.4, with an 
overall rms scatter of 0.13, and there are some k modes for which 
the ratio is slightly higher. It is interesting to point out here that 
the ML inversion filters noise spatially and GMCA removes the 
smooth foregrounds from the frequency direction. Therefore, when 
comparing the recovered and input PS we have not subtracted any 
noise PS from the GMCA residuals. This may lead to some extra 
noise bias in some of the k modes where the ratio is larger than 
1. In the spherically averaged power spectrum (Figure p~5|) we find 
a trend where the foregrounds are over subtracted at the small-k 
modes and there is a hint of some extra residual noise on the larger 
k modes. Despite these differences at the level of several ten per¬ 
cent, we emphasize that the results in Figure [14] gives us a good 
representation of how well our Bayesian plus GMCA method re¬ 
covers the EoR signal. 


6.1.2 Spherically average power-spectrum 


Next, we average the PS in spherical shells and we computed 
the spherically averaged dimensionless power spectrum, A 2 (k) = 
k 3 P(k)/27r 2 . In Fig.1 


15 


we show the input and recovered spheri¬ 
cally averaged PS with 2 o noise fluctuations. The noise contribu¬ 
tion only includes sample variance and the system noise makes a 
relatively smaller contribution at these low k modes where we are 
most sensitive in recovering the EoR signal. We find the recovered 
EoR signal is mostly within 2er error-bars in the k range shown in 
Fig-[l5] 


7 CONCLUSIONS AND FUTURE WORK 

In this paper, we have introduced a Bayesian framework with spa¬ 
tial regularization to recover the diffuse foregrounds and the red- 
shifted 21-cm HI signal from a set of calibrated (gridded) visibili¬ 
ties. We have shown that gradient regularization with optimal reg- 
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Figure 14. This figure shows the ratio of the recovered and input EoR PS. 
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Figure 15. The spherically averaged 3D PS for the input and recovered EoR 
signal, along with 2<j sample variance . Note some deviations at the higher 
k-values, probably due to left-over noise in the image residuals after the ML 
inversion and GMCA foreground removal. Note that we have not attempted 
to remove the residual noise power-spectrum. 
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ularization constant which maximizes the evidence works well in 
reconstructing the input signal. 

Given the large size of the current data sets, it will be a large 
computational effort to carry out the inversion directly from the cal¬ 
ibrated visibility data sets, but we have demonstrated that gridded 
visibility data can be used effectively in the ML inversion, pro¬ 
viding reduced yf values close to unity. Finer binning gives better 
results, since it provides better approximation to the position of the 
visibilities in the uv plane. However, binning on scales less than 2 
A is not necessary. We note that this is comparable to the current 
(u, v) cell-sizes in LOFAR analyzes. 

We used the non-parametric foreground removal technique, 
GMCA (Chapman et a l.|2013} , to remove the smooth diffuse fore¬ 
grounds. In case of the cylindrical power spectrum, we can mostly 
recover the input EoR signal within a region of 0.14 Mpc -1 < 
fcy < 0.35Mpc -1 and0.03Mpc -1 < k± < 0.19Mpc -1 . The 
high k± and low k\\ modes beyond these limits which are accessi¬ 
ble in LOFAR observations are mostly dominated by system noise 
and the removal of smooth large scale modes in the frequency di¬ 


rection by GMCA. Scales 0.24 Mpc -1 < k < 0.40 Mpc -1 , are 
largely (within 2-sigma) recovered in the spherically averaged 3D 
input EoR power spectrum. Although, at higher k scales the resid¬ 
ual noise has some effects and the recovered EoR level is slightly 
higher compared to the input power spectrum. 

In the near future, we plan to extend our Bayesian imaging 
technique for a full Stokes analysis where we will compare the ML 
solution with those of the more classical ones and precisely quan¬ 
tify the errors and correlation between all model parameters. We 
will also apply our technique to the real LOFAR data where most of 
the point sources have been subtracted and the residual visibilities 
are dominated by the unresolved source confusion noises. Another 
goal is to include direction dependent (such as ionosphere, beam 
etc.) and direction independent calibration errors (such as gain vari¬ 
ation) in predicting the corresponding sky from the measured data. 
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